Shared Linguistic Resources for the Meeting Domain

نویسندگان

  • Meghan Lammie Glenn
  • Stephanie Strassel
چکیده

This paper describes efforts by the University of Pennsylvania's Linguistic Data Consortium to create and distribute shared linguistic resources – including data, annotations, tools and infrastructure – to support the Spring 2007 (RT-07) Rich Transcription Meeting Recognition Evaluation. In addition to making available large volumes of training data to research participants, LDC produced reference transcripts for the NIST Phase II Corpus and RT-07 conference room evaluation set, which represent a variety of subjects, scenarios and recording conditions. For the 18-hour NIST Phase II Corpus, LDC created quick transcripts which include automatic segmentation and minimal markup. The 3-hour evaluation corpus required the creation of careful verbatim reference transcripts including manual segmentation and rich markup. The 2007 effort marked the second year of using the XTrans annotation tool kit in the meeting domain. We describe the process of creating transcripts for the RT-07 evaluation, and describe the advantages of utilizing XTrans for each phase of transcription and its positive impact on quality control and real-time transcription rates. This paper also describes the structure and results of a pilot consistency study that we conducted on the 3-hour test set. Finally, we present plans for further improvements to infrastructure and transcription methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Linguistic Resources for the Meeting Domain

This paper describes efforts by the University of Pennsylvania's Linguistic Data Consortium to create and distribute shared linguistic resources – including data, annotations, and tools – to support the Spring 2009 (RT-09) Rich Transcription Meeting Recognition Evaluation. In addition to making available large volumes of training data to research participants, LDC produced reference transcripts...

متن کامل

Linguistic Resources for Meeting Speech Recognition

This paper describes efforts by the University of Pennsylvania's Linguistic Data Consortium to create and distribute shared linguistic resources – including data, annotations, tools and infrastructure – to support the Rich Transcription 2005 Spring Meeting Recognition Evaluation. In addition to distributing large volumes of training data, LDC produced reference transcripts for the RT-05S confer...

متن کامل

Domain-Adaptive Information Extraction

We present in this paper the methodology developed within the PARADIME (Parameterizable Domain-Adaptive Information and Message Extraction) project for designing an Information Extraction (IE) system easily adaptable to new domains of application. For this we went for a strict separation of the (shallow) linguistic processing modules on the one hand and the domain-modeling modules on the other ...

متن کامل

The nature of working memory in linguistic, arithmetic and spatial integration processes

This paper reports the results of four dual-task experiments that were designed to determine the extent of domain-specificity of the verbal workingmemory resources used in linguistic integrations. To address this question, syntactic complexity was crossed in a 2 · 2 design with the complexity of a secondary task, which involved either (1) arithmetic integration processes and therefore relied on...

متن کامل

Optimal Planning for Water Resources Allocation (Case study: Hableh Roud Basin, Iran)

The world is facing severe challenges in meeting the rapidly growing demand for water resources. In addition,irrigation water which is the largest use of water in most developing countries and arid and semi arid regions, will likelyhave to be diverted increasingly to meet the needs of the households in urban areas and industry sectors whilst remaining aprime engine of agricultural growth. A Lin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007